Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pull] main from llvm:main #5546

Merged
merged 2,592 commits into from
Feb 6, 2025
Merged

[pull] main from llvm:main #5546

merged 2,592 commits into from
Feb 6, 2025

Conversation

pull[bot]
Copy link

@pull pull bot commented Jan 16, 2025

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.1)

Can you help keep this open source service alive? 💖 Please sponsor : )

@pull pull bot added the ⤵️ pull label Jan 16, 2025
tbaederr and others added 29 commits February 5, 2025 08:04
…#124754)

Enable device code ASan instrumentation for openmp offload applications
using option '-fsanitize=address'.
…124247)

The profiling of the LLVM Test-suite reveals that a significant portion,
specifically 14,090 out of 139,323, loop nests were identified as
non-viable candidates for transformation, leading to the transform
exiting from isComputableLoopNest() without any action.

More importantly, dependence information was computed for these loop
nests before reaching the function isComputableLoopNest(), which does
not require DI and relies solely on scalar evolution (SE).

To enhance compile-time efficiency, this patch moves the call to
isComputableLoopNest() earlier in the control-flow, thereby avoiding
unnecessary dependence calculations.

The impact of this change is evident on the compile-time-tracker, with
the overall geometric mean improvement recorded at 0.11%, while the
lencode benchmark gets a more substantial benefit of 0.44%.
This improvement can be tracked in the isc-ln-exp-2 branch under my
repo.
…turns (#125280)

This patch does two things.

1. Previously, when checking driver arguments, we emitted an error for
unsupported values of `-mbranch-protection` when using pauthtest ABI.
The reason for that was ptrauth-returns being enabled as part of
pauthtest. This patch changes the check against pauthtest to a check
against ptrauth-returns.

2. Similarly, check against values of the following function attribute
which are unsupported with ptrauth-returns:
`__attribute__((target("branch-protection=XXX`. Note that existing
`validateBranchProtection` function is used, and current behavior is to
ignore the unsupported attribute value, so no error is emitted.
…fixes.

Re-enables compact-unwind support in JITLink, which was reverted in b04847b
due to buildbot failures.

The underlying cause for the failures on the buildbots was the lack of
compact-unwind registration support on older Darwin OSes. Since the
CompactUnwindManager pass now removes eh-frames by default we were left with
unwind-info that could not be registered. On x86-64, where eh-frame info is
produced by default the solution is to fall back to using eh-frames. On arm64
we simply can't support exceptions on older OSes.

This patch updates the EHFrameRegistrationPlugin to remove the compact-unwind
section (__LD,__compact_unwind) when installed, forcing use of eh-frames when
the EHFrameRegistrationPlugin is used. In LLJIT, the EHFrameRegistrationPlugin
continues to be used for all non-Darwin platform, and will be added on Darwin
platforms when the a CompactUnwindRegistrationPlugin instance can't be created
(e.g. due to missing support for compact-unwind info registration).

The lit.cfg.py script is updated to check whether the host OSes default unwind
info supports JIT registration, allowing tests to be disabled for older Darwin
OSes on arm64.
This library is provided by flang, not MLIR, so it should not be part of
MLIR_LIBS.

Fixes an issue introduced in #120966.
For convenience this patch drops nsw for `sub`. It also allows this fold
with `ctlz_zero_undef`.
Alive2: https://alive2.llvm.org/ce/z/VmvqSt
This is a test library which is not part of libMLIR, so it should use
normal LINK_LIBS instead of mlir_target_link_libraries.

This fixes an issue introduced in #123910 and follows up on the fix in
#125004, which added the library to DEPENDS, which is not sufficient.
Changes:
1. Fix inconsistencies in register pressure set printing. "Max Pressure"
   printing is inconsistent with "Bottom Pressure" and "Top Pressure".
   For the former, register class begins on the same line vs newline for
   latter. Also for the former, the first register class is on the same
   line, but subsequent register classes are newline separated. That's
   removed so all are on the same line.

   Before:
     Max Pressure: FPR8=1
     GPR32=14
     Top Pressure:
     GPR32=2
     Bottom Pressure:
     FPR8=7
     GPR32=17

   After:
     Max Pressure: FPR8=1 GPR32=14
     Top Pressure: GPR32=2
     Bottom Pressure: FPR8=7 GPR32=17

2. After scheduling an instruction, don't print pressure diff if there
   isn't one. Also s/UpdateRegP/UpdateRegPressure. E.g.,

   Before:
     UpdateRegP: SU(3) %0:gpr64common = ADDXrr %58:gpr64common, gpr64
                 to
     UpdateRegP: SU(4) %393:gpr64sp = ADDXri %58:gpr64common, 390, 12
                 to GPR32 -1

   After:
     UpdateRegPressure: SU(4) %393:gpr64sp = ADDXri %58:gpr64common, 12
                        to GPR32 -1
3. Don't print excess pressure sets if there are none.
Previously the grammar tokens SimpleValue2 through SimpleValue9 were
unreferenced. This ties them together so that the grammar makes more
sense.
This commit moves the rotate builtin to the CLC library.

It also optimizes rotate(x, n) to generate the @llvm.fshl(x, x, n)
intrinsic, for both scalar and vector types. The previous implementation
was too cautious in its handling of the shift amount; the OpenCL rules
state that the shift amount is always treated as an unsigned value
modulo the bitwidth.
PR #124961 adds intrinsics for the tcgen05
alloc/dealloc PTX instructions. This patch
adds NVVM Ops for the same.

Tests are added to verify the lowering to
the corresponding intrinsics in tcgen05-alloc.mlir file.

PTX ISA link:
https://docs.nvidia.com/cuda/parallel-thread-execution/#tcgen05-memory-alloc-manage-instructions

Signed-off-by: Durgadoss R <[email protected]>
The Fortran libraries are not part of MLIR, so they should use
target_link_libraries() rather than mlir_target_link_libraries().

This fixes an issue introduced in
#120966.
These were referring to nonexistent grammar tokens instead of `Value`.
Removed the TOSA quantization attribute used in various MLIR TOSA
dialect operations in favour of using builtin attributes.

Update any lit tests, conversions and transformations appropriately.

Signed-off-by: Tai Ly <[email protected]>
Co-authored-by: Tai Ly <[email protected]>
This PR moves maximum number of threads in a block and block in a grid
to nvgpu dialect to avoid replicated code.

The limits are defined here:

https://docs.nvidia.com/cuda/cuda-c-programming-guide/#features-and-technical-specifications-technical-specifications-per-compute-capability
Make lifetime management more explicit. We're only using this for
CXXPseudoDestructorExprs for now but we need this to handle
std::construct_at/placement-new after destructor calls later anyway.
When building mlir with `-DMLIR_NVVM_EMBED_LIBDEVICE=ON`, there will be
a warning
```
build/tools/mlir/lib/Target/LLVM/libdevice_embedded.c:1: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘143’ to ‘-113’ [-Woverflow]
```
which is followed by a large number of characters in stdout.

Fix this to avoid stdout outputting a large number of characters (3e5).
The previous implementation had false positive/negative cases in the
analysis of the loop carried dependency.

A missed dependency case is caused by incorrect analysis of address
increments. This is fixed by strict analysis of recursive definitions.
See added test swp-carried-dep4.mir.

Excessive dependency detection is fixed by improving the formula
for determining the overlap of address ranges to be accessed. See added test
swp-carried-dep5.mir.
If the input contains odd number of shuffled vectors, the 2 last
shuffles are shuffled with the same first vector. Need to correctly
process such situation: when the first vector is requested for the first
time - extract it from the source vector, when it is requested the
second time - reuse previous result. The second vector should be
extracted in both cases.

Fixes #125269

Reviewers: topperc, preames

Reviewed By: preames

Pull Request: #125693
krzysz00 and others added 29 commits February 6, 2025 11:38
There were a bunch of spots in ROCDL.td where we were defining our own
llvmBuilder call which could have been generated using the default
built-in one on LLVM_IntrOpBase.

This commit cleans up such usages in the interests of potentinally
enabling ROCDL import in the future and of making best practices more
obvious.

The one breaking change is renaming WaitcntOp to SWaitcntOp, which
should have minimal impact.
…Cost with the input operand type, not the result

Typo in #121216

Fixes #126085
To establish a baseline for new tests mentioned in
#117785, adding them here
independently.
threadset clause is part of free-agent threads. But, free-agent threads
also involves runtime implementation. So, added an separate entry for
threadset clause and claimed it for clang.

---------

Co-authored-by: Sunil Kuravinakop <[email protected]>
If BOLT fails to locate DWO CU when using split DWARF, this signifies an
issue with the input (missing .dwo) rather than an internal assertion.
…-V (#125905)

This is a follow-up to 5df62bd. That
commit should not have needed to make the vector.insert and
vector.extract conversions to SPIR-V directly handle the static poison
index case, as there is a fold from those to ub.poison, and a conversion
pattern from ub.poison to spirv.Undef, however:

- The ub.poison fold result could not be materialized by the vector
dialect (fixed as of d13940e).
- The conversion pattern wasn't being populated in VectorToSPIRVPass,
which is used by the tests. This commit changes this.
- The ub.poison to spirv.Undef pattern rejected non-scalar types, which
prevented its use for vector results. It is unclear why this restriction
existed; a remark in D156163 said this was to avoid converting "user
types", but it is not obvious why these shouldn't be permitted (the
SPIR-V specification allows OpUndef for all types except OpTypeVoid).
This commit removes this restriction.

With these fixed, this commit removes the redundant static poison index
handling, and updates the tests.
Summary:
This probably wasn't the intended result, but the code here causes
OpenMP to always link in `ockl.bc` which was intentionally not linked.
This results in the OCKL definitions conflicting with the OpenMP ones
and also prevents them from being optimized out (Might be fixed with
newer ROCm that actually builds the visibility correctly).

I'm pretty sure the only reason this didn't break the tests is because
we're smart and pass `-nogpulib` there to keep the environment from
being poisoned with stuff like this.
…125732)

Reapplies #122471

This is based on #125699, only
the latest commit is relevant.

With changes in this PR and the parent one, the previously reported
failures in the Fujitsu(*) test suite should hopefully be resolved (I
verified all the 14 reported failures and they pass now).

(*) https://linaro.atlassian.net/browse/LLVM-1521
Use CmpPredicate::getMatching in isImpliedCondBalancedTypes to pass
samesign information to isImpliedViaOperations, and teach it to call
CmpPredicate::getPreferredSignedPredicate, effectively making it
optimize with samesign information.
…r uses (#124327)

Remove the restriction that scheduling rematerialization candidates
cannot have virtual reg uses.

Currently, this only allows for virtual reg uses which are already live
at the rematerialization point, so bring in allUsesAvailableAt to check
for this condition. Because of this condition, the uses of the remats
will already be live in to the region, so the remat won't increase
live-in pressure.

Add an expensive check to check this condition.
These aliases are never used, so we can ditch them.
…ing definition (#126061)

Having them defined ouf-of-line results in a significant amount of
boilerplate without improving readability, since they're just one or two
lines long anyways.

As a drive-by, add comments between the declarations to make them easier
to distinguish.
…(y - z) + z even with Zicond. (#125772)"

With the test changes.

Original message:

The Zicond version of this requires an li instruction and an
additional register.

Without Zicond we match this in a DAGCombine on RISCVISD::SELECT_CC.

This PR has 2 commits. I'll pre-commit the test change if this looks
good.
- Use `Emitter::OptClass` to invoke `InstrInfoEmitter::run` and
eliminate the `EmitInstrInfo` function.
Since line zero means "no line information", when symbolizing a location
(an address or an inline frame associated with the address) that has a
line zero location, we shouldn't include other irrelevant data (like
filename) in the result.
#125923)

NaryReassociate would crash on expressions like the one in the added
test that involved pointers where the size of the type was greater than
the index width of the pointer, causing calls to SCEV's zext expression
on types that didn't need to be zero-extended.

This commit fixes the issue.
…25629)

LLVM itself is generally moving away from using `undef` and towards
using `poison`, to the point of having a lint that caches new uses of
`undef` in tests.

In order to not trip the lint on new patterns and to conform to the
evolution of LLVM
- Rename valious ::undef() methods on StructBuilder subclasses to
::poison()
- Audit the uses of UndefOp in the MLIR libraries and replace almost all
of them with PoisonOp

The remaining uses of `undef` are initializing `uninitialized` memrefs,
explicit conversions to undef from SPIR-V, and a few cases in
AMDGPUToROCDL where usage like

    %v = insertelement <M x iN> undef, iN %v, i32 0
    %arg = bitcast <M x iN> %v to i(M * N)

is used to handle "i32" arguments that are are really packed vectors of
smaller types that won't always be fully initialized.
…dSkipZero is not set (#126044)

I ran into this while working on a different patch where I'm emitting a
zero-valued DWARF enum field which shouldn't be skipped.

This patch checks the (currently unused) `ShouldSkipZero` before
deciding to skip printing this field. Based on git history this seems
like an oversight from the initial refactor that introduced this. We
have a similar check in `printInt`.

Wasn't sure how to best test this, but tests in an upcoming patch rely
on this functionality (see
#126045).

Currently the only place `ShouldSkipZero` is set to `false` is when
emitting the `DW_LANG_` enum. But the language codes start at `0x1`. So
it never exercised this codepath (and we should probably just make it
not pass this parameter).
#126104)

The fix requires more investigation, and it's a test issue so reverting
the product changes should not be necessary.

Signed-off-by: Sarnie, Nick <[email protected]>
)

We test the `--repository` output in the unit tests, but that option fails to
change the HTML output in the end-to-end tests. Upcoming patches will address
the incorrect behavior.
…#122275)

Goals:
1. To add syntax and semantic to 'batch_matmul' without changing any of
the existing syntax expectations for current usage. batch_matmul is
still just batch_matmul.

2. Move the definition of batch_matmul from linalg OpDsl to tablegen ODS
infra.

Scope of this patch:
To expose broadcast and transpose semantics on the 'batch_matmul'.

The broadcast and transpose semantic are as follows:

By default, 'linalg.batch_matmul' behavior will remain as is. Broadcast
and Transpose semantics can be applied by specifying the explicit
attribute 'indexing_maps' as shown below. This is a list attribute, so
the list must include all the maps if specified.

    Example Transpose:
    ```
    linalg.batch_matmul indexing_maps = [
affine_map< (d0, d1, d2, d3) -> (d0, d3, d1)>, //transpose
                   affine_map< (d0, d1, d2, d3) -> (d0, d3, d2)>,
                   affine_map< (d0, d1, d2, d3) -> (d0, d1, d2)>
                   ]
ins (%arg0, %arg1: memref<2x5x3xf32>,memref<2x5x7xf32>)
                   outs (%arg2: memref<2x3x7xf32>)
    ```

    Example Broadcast:
    ```
    linalg.batch_matmul indexing_maps = [
affine_map< (d0, d1, d2, d3) -> (d3)>, //broadcast
                       affine_map< (d0, d1, d2, d3) -> (d0, d3, d2)>,
                       affine_map< (d0, d1, d2, d3) -> (d0, d1, d2)>
                     ]
                     ins (%arg0, %arg1: memref<5xf32>,memref<2x5x7xf32>)
                     outs (%arg2: memref<2x3x7xf32>)
    ```

    Example Broadcast and transpose:
    ```
    linalg.batch_matmul indexing_maps = [
affine_map< (d0, d1, d2, d3) -> (d1, d3)>, //broadcast
affine_map< (d0, d1, d2, d3) -> (d0, d2, d3)>, //transpose
                       affine_map< (d0, d1, d2, d3) -> (d0, d1, d2)>
                     ]
ins (%arg0, %arg1: memref<3x5xf32>, memref<2x7x5xf32>)
                     outs (%arg2: memref<2x3x7xf32>)
    ```

RFCs and related PR:

https://discourse.llvm.org/t/rfc-linalg-opdsl-constant-list-attribute-definition/80149
https://discourse.llvm.org/t/rfc-op-explosion-in-linalg/82863
https://discourse.llvm.org/t/rfc-mlir-linalg-operation-tree/83586
#115319
@pull pull bot merged commit c4d75b1 into Ericsson:main Feb 6, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment